mtDNA data mining in GenBank needs surveying.

نویسندگان

  • Yong-Gang Yao
  • Antonio Salas
  • Ian Logan
  • Hans-Jürgen Bandelt
چکیده

To the Editor: Since the first sequencing of the complete humanmtDNA genome, both the sequencing techniques and the quality of commercial kits have improved greatly. This has led to a growing number of reports for complete mtDNA sequences from the fields of molecular anthropology, medical genetics, and forensic science; and there are now over 6700 complete or near-complete mtDNA sequences available for study. However, in comparison to the pioneer manual-sequencing efforts in the early nineties, the overall mtDNA data quality, especially in the medical field, is still far from satisfactory. Sequencing errors and inadvertent mistakes in the reported mtDNA data are not infrequent. Deficient mtDNA data sets of complete genomes can have important consequences for the conclusions achieved in many studies and may also pose problems for any subsequent reanalyses. Most recently, Pereira and colleagues discussed the overall picture of the mtDNA genome diversity in worldwide human populations with a comprehensive reanalysis of 5140 published complete or near-complete (lacking some control region information) mtDNA sequences. Their study represents an important advance in defining the effects of gene structures on limiting mtDNA diversity and may have valuable implications for mtDNA studies in the medical field. However, all of the data used in the study by Pereira et al. were directly retrieved from GenBank without any scrutiny for problematic or flawed data that should have been excluded. Many of the mtDNA sequences analyzed in their study have in fact already been questioned in the literature or even corrected by their authors, but unfortunately, in several instances the new corrected versions of the sequences have not been made generally available or updated in GenBank. In Table 1, we list some of the problematic data sets and single sequences used by Pereira et al. in their study. Among them is the original data set of Herrnstadt et al., which was announced by the authors as having been corrected, although the new sequences have never been entered into GenBank. Portions of those codingregion data (in either corrected or uncorrected form) were augmented by the associated control-region data and published in several papers; thus, none of these expanded data can be downloaded fromGenBank but have to be retrieved from the figures in the corresponding articles. To cite a more recent example, the African mtDNA data set published by Gonder et al. is of particularly poor quality. These sequences are incompletely recorded (as already mentioned by Behar et al.); the most extreme instance of this is the haplogroup L0k1 sequence EF184609 that lacks as many as 25 expected variants scattered along the

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Control Control Control: A Reassessment and Comparison of GenBank and Chromatogram mtDNA Sequence Variation in Baltic Grey Seals (Halichoerus grypus)

Genetic data can provide a powerful tool for those interested in the biology, management and conservation of wildlife, but also lead to erroneous conclusions if appropriate controls are not taken at all steps of the analytical process. This particularly applies to data deposited in public repositories such as GenBank, whose utility relies heavily on the assumption of high data quality. Here we ...

متن کامل

Pseudogenization of the Humanin gene is common in the mitochondrial DNA of many vertebrates

In the human the peptide Humanin is produced from the small Humanin gene which is embedded as a gene-within-a-gene in the 16S ribosomal molecule of the mitochondrial DNA (mtDNA). The peptide itself appears to be significant in the prevention of cell death in many tissues and improve cognition in animal models. By using simple data mining techniques, it is possible to show that 99.4% of the huma...

متن کامل

Phylogeny of gazelles in some islands of Iran based on mtDNA sequences: Species identification and implications for conservation

Different species of gazelles are among the most endangered mammals on the Asian steppes and occur in the central, southern and northwestern regions of Iran. The previous conservation efforts in this region have been incomplete due to confusion about the phylogenetic relationship among various populations. So that, different conservation programs such as ex-situ breeding and transfer of captive...

متن کامل

Letter to the Editor Cytochrome c Oxidase Sequence Comparisons Suggest an Unusually High Rate of Mitochondrial DNA Evolution in Mytilrcs (Mollusca: Bivalvia)

Blue mussels of the genus Mytilus have an unusual mode of mitochondrial DNA (mtDNA) transmission: males receive mtDNA from both parents and transmit their paternal mtDNA to their sons; females receive mtDNA only from their mother (Skibinski, Gallagher, and Beynon 1994~2, 1994b; Zouros et al. 1994~2, 1994b). This mode of mtDNA transmission has been termed “doubly uniparental inheritance” (DUI; Z...

متن کامل

Mitochondrial genomes of domestic animals need scrutiny.

More than 1000 complete or near-complete mitochondrial DNA (mtDNA) sequences have been deposited in GenBank for eight common domestic animals (cattle, dog, goat, horse, pig, sheep, yak and chicken) and their close wild ancestors or relatives, as well. Nevertheless, few efforts have been performed to evaluate the sequence data quality. Herein, we conducted a phylogenetic survey of these complete...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • American journal of human genetics

دوره 85 6  شماره 

صفحات  -

تاریخ انتشار 2009